Expression Templates and OpenCL
نویسندگان
چکیده
In this paper we discuss the interaction of expression templates with OpenCL devices. We show how the expression tree of expression templates can be used to generate problem specific OpenCL kernels. In a second approach we use expression templates to optimize the data transfer between the host and the device which leads to a measurable performance increase in a domain specific language approach. We tested the functionality, correctness and performance for both implementations in a case study for vector and matrix operations.
منابع مشابه
OpenCL C + + Benedict
With the success of programming models such as Khronos’ OpenCL, heterogeneous computing is going mainstream. However, these models are low-level, even when considering them as systems programming models. For example, OpenCL is effectively an extended subset of C99, limited to the type unsafe procedural abstraction that C has provided for more than 30 years. Computer systems programming has for ...
متن کاملProgramming GPUs with C++14 and Just-In-Time Compilation
Systems that comprise accelerators (e.g., GPUs) promise high performance, but their programming is still a challenge, mainly because of two reasons: 1) two distinct programming models have to be used within an application: one for the host CPU (e.g., C++), and one for the accelerator (e.g., OpenCL or CUDA); 2) using Just-In-Time (JIT) compilation and its optimization opportunities in both OpenC...
متن کاملPerformance Portability in Accelerated Parallel Kernels
Heterogeneous architectures, by definition, include multiple processing components with very different microarchitectures and execution models. In particular, computing platforms from supercomputers to smartphones can now incorporate both CPU and GPU processors. Disparities between CPU and GPU processor architectures have naturally led to distinct programming models and development patterns for...
متن کاملAutomatic Translation of Cuda to Opencl and Comparison of Performance Optimizations on Gpus
As an open, royalty-free framework for writing programs that execute across heterogeneous platforms, OpenCL gives programmers access to a variety of data parallel processors including CPUs, GPUs, the Cell and DSPs. All OpenCL-compliant implementations support a core specification, thus ensuring robust functional portabiity of any OpenCL program. This thesis presents the CUDAtoOpenCL source-to-s...
متن کاملCaKernel - A GPGPU Kernel Asbtraction and Implementation for Scientific Computing on Heterogeneous Systems
We presented our work to design and implement a GPGPU kernel abstraction, which is suitable for developing highly efficient large scale scientific applications using stencil computations on hybrid CPU/GPU systems. By leveraging the MPI-based data parallelism implemented in Cactus, we have developed a CaKernel programming framework in the CUDA/OpenCL architecture to facilitate the development pr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011